Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 52(D1): D194-D202, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37587690

RESUMO

N 6-Methyladenosine (m6A) is one of the most abundant internal chemical modifications on eukaryote mRNA and is involved in numerous essential molecular functions and biological processes. To facilitate the study of this important post-transcriptional modification, we present here m6A-Atlas v2.0, an updated version of m6A-Atlas. It was expanded to include a total of 797 091 reliable m6A sites from 13 high-resolution technologies and two single-cell m6A profiles. Additionally, three methods (exomePeaks2, MACS2 and TRESS) were used to identify >16 million m6A enrichment peaks from 2712 MeRIP-seq experiments covering 651 conditions in 42 species. Quality control results of MeRIP-seq samples were also provided to help users to select reliable peaks. We also estimated the condition-specific quantitative m6A profiles (i.e. differential methylation) under 172 experimental conditions for 19 species. Further, to provide insights into potential functional circuitry, the m6A epitranscriptomics were annotated with various genomic features, interactions with RNA-binding proteins and microRNA, potentially linked splicing events and single nucleotide polymorphisms. The collected m6A sites and their functional annotations can be freely queried and downloaded via a user-friendly graphical interface at: http://rnamd.org/m6a.


Assuntos
Bases de Dados Genéticas , Metilação de RNA , RNA Mensageiro , Transcriptoma , Splicing de RNA , RNA Mensageiro/química , RNA Mensageiro/metabolismo , Processamento Pós-Transcricional do RNA
2.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-36932656

RESUMO

Post- and co-transcriptional RNA modifications are found to play various roles in regulating essential biological processes at all stages of RNA life. Precise identification of RNA modification sites is thus crucial for understanding the related molecular functions and specific regulatory circuitry. To date, a number of computational approaches have been developed for in silico identification of RNA modification sites; however, most of them require learning from base-resolution epitranscriptome datasets, which are generally scarce and available only for a limited number of experimental conditions, and predict only a single modification, even though there are multiple inter-related RNA modification types available. In this study, we proposed AdaptRM, a multi-task computational method for synergetic learning of multi-tissue, type and species RNA modifications from both high- and low-resolution epitranscriptome datasets. By taking advantage of adaptive pooling and multi-task learning, the newly proposed AdaptRM approach outperformed the state-of-the-art computational models (WeakRM and TS-m6A-DL) and two other deep-learning architectures based on Transformer and ConvMixer in three different case studies for both high-resolution and low-resolution prediction tasks, demonstrating its effectiveness and generalization ability. In addition, by interpreting the learned models, we unveiled for the first time the potential association between different tissues in terms of epitranscriptome sequence patterns. AdaptRM is available as a user-friendly web server from http://www.rnamd.org/AdaptRM together with all the codes and data used in this project.


Assuntos
Biologia Computacional , RNA , RNA/genética , Metilação , Análise de Sequência de RNA/métodos , Biologia Computacional/métodos
3.
Nucleic Acids Res ; 51(D1): D1388-D1396, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36062570

RESUMO

Recent advances in epitranscriptomics have unveiled functional associations between RNA modifications (RMs) and multiple human diseases, but distinguishing the functional or disease-related single nucleotide variants (SNVs) from the majority of 'silent' variants remains a major challenge. We previously developed the RMDisease database for unveiling the association between genetic variants and RMs concerning human disease pathogenesis. In this work, we present RMDisease v2.0, an updated database with expanded coverage. Using deep learning models and from 873 819 experimentally validated RM sites, we identified a total of 1 366 252 RM-associated variants that may affect (add or remove an RM site) 16 different types of RNA modifications (m6A, m5C, m1A, m5U, Ψ, m6Am, m7G, A-to-I, ac4C, Am, Cm, Um, Gm, hm5C, D and f5C) in 20 organisms (human, mouse, rat, zebrafish, maize, fruit fly, yeast, fission yeast, Arabidopsis, rice, chicken, goat, sheep, pig, cow, rhesus monkey, tomato, chimpanzee, green monkey and SARS-CoV-2). Among them, 14 749 disease- and 2441 trait-associated genetic variants may function via the perturbation of epitranscriptomic markers. RMDisease v2.0 should serve as a useful resource for studying the genetic drivers of phenotypes that lie within the epitranscriptome layer circuitry, and is freely accessible at: www.rnamd.org/rmdisease2.


Assuntos
Bases de Dados Factuais , Processamento Pós-Transcricional do RNA , Animais , Humanos , Fenótipo , SARS-CoV-2/genética , SARS-CoV-2/metabolismo , Epigenômica
4.
Mol Ther Nucleic Acids ; 30: 337-345, 2022 Dec 13.
Artigo em Inglês | MEDLINE | ID: mdl-36381577

RESUMO

DNA methylation is one of the earliest epigenetic regulation mechanisms studied extensively, and it is critical for normal development, diseases, and gene expression. As a recently identified chemical modification of DNA, N4-acetyldeoxycytosine (4acC) was shown to be abundant in Arabidopsis and highly associated with gene expression and actively transcribed genes. Precise identification of 4acC is essential for studying its biological function. We proposed the 4acCPred, the first computational framework for predicting 4acC-carrying regions from Arabidopsis genomic DNA sequences. Since the existing 4acC data are not precise for a specific base but only report regions that are hundreds of bases long, we formulated the task as a weakly supervised learning problem and built 4acCPred using a multi-instance-based deep neural network. Both cross-validation and independent testing on the four datasets under different conditions show promising performance, with mean areas under the receiver operating characteristic curve (AUCs) of 0.9877 and 0.9899, respectively. 4acCPred also provides motif mining through model interpretation. The motifs found by 4acCPred are consistent with existing knowledge, indicating that the model successfully captured real biological signals. In addition, a user-friendly web server was built to facilitate 4acC prediction, motif visualization, and data access. Our framework and web server should serve as useful tools for 4acC research.

5.
Int J Mol Sci ; 23(21)2022 Nov 04.
Artigo em Inglês | MEDLINE | ID: mdl-36362279

RESUMO

One of the most abundant non-canonical bases widely occurring on various RNA molecules is 5-methyluridine (m5U). Recent studies have revealed its influences on the development of breast cancer, systemic lupus erythematosus, and the regulation of stress responses. The accurate identification of m5U sites is crucial for understanding their biological functions. We propose RNADSN, the first transfer learning deep neural network that learns common features between tRNA m5U and mRNA m5U to enhance the prediction of mRNA m5U. Without seeing the experimentally detected mRNA m5U sites, RNADSN has already outperformed the state-of-the-art method, m5UPred. Using mRNA m5U classification as an additional layer of supervision, our model achieved another distinct improvement and presented an average area under the receiver operating characteristic curve (AUC) of 0.9422 and an average precision (AP) of 0.7855. The robust performance of RNADSN was also verified by cross-technical and cross-cellular validation. The interpretation of RNADSN also revealed the sequence motif of common features. Therefore, RNADSN should be a useful tool for studying m5U modification.


Assuntos
Redes Neurais de Computação , RNA de Transferência , RNA Mensageiro/genética , RNA de Transferência/genética , Uridina
6.
Nucleic Acids Res ; 50(18): 10290-10310, 2022 10 14.
Artigo em Inglês | MEDLINE | ID: mdl-36155798

RESUMO

As the most pervasive epigenetic mark present on mRNA and lncRNA, N6-methyladenosine (m6A) RNA methylation regulates all stages of RNA life in various biological processes and disease mechanisms. Computational methods for deciphering RNA modification have achieved great success in recent years; nevertheless, their potential remains underexploited. One reason for this is that existing models usually consider only the sequence of transcripts, ignoring the various regions (or geography) of transcripts such as 3'UTR and intron, where the epigenetic mark forms and functions. Here, we developed three simple yet powerful encoding schemes for transcripts to capture the submolecular geographic information of RNA, which is largely independent from sequences. We show that m6A prediction models based on geographic information alone can achieve comparable performances to classic sequence-based methods. Importantly, geographic information substantially enhances the accuracy of sequence-based models, enables isoform- and tissue-specific prediction of m6A sites, and improves m6A signal detection from direct RNA sequencing data. The geographic encoding schemes we developed have exhibited strong interpretability, and are applicable to not only m6A but also N1-methyladenosine (m1A), and can serve as a general and effective complement to the widely used sequence encoding schemes in deep learning applications concerning RNA transcripts.


Assuntos
Aprendizado Profundo , RNA Longo não Codificante , Regiões 3' não Traduzidas , Metilação , Isoformas de Proteínas/genética , RNA/genética , RNA/metabolismo , RNA Mensageiro/genética
7.
Artigo em Inglês | MEDLINE | ID: mdl-36096444

RESUMO

As the most pervasive epigenetic marker present on mRNA and lncRNA, N6-methyladenosine (m6A) RNA methylation has been shown to participate in essential biological processes. Recent studies have revealed the distinct patterns of m6A methylome across human tissues, and a major challenge remains in elucidating the tissue-specific presence and circuitry of m6A methylation. We present here a comprehensive online platform m6A-TSHub for unveiling the context-specific m6A methylation and genetic mutations that potentially regulate m6A epigenetic mark. m6A-TSHub consists of four core components, including: (1) m6A-TSDB, a comprehensive database of 184,554 functionally annotated m6A sites derived from 23 human tissues and 499,369 m6A sites from 25 tumor conditions, respectively; (2) m6A-TSFinder, a web server for high-accuracy prediction of m6A methylation sites within a specific tissue from RNA sequences, which was constructed using multi-instance deep neural networks with gated attention; (3) m6A-TSVar, a web server for assessing the impact of genetic variants on tissue-specific m6A RNA modifications; and (4) m6A-CAVar, a database of 587,983 The Cancer Genome Atlas (TCGA) cancer mutations (derived from 27 cancer types) that were predicted to affect m6A modifications in the primary tissue of cancers. The database should make a useful resource for studying the m6A methylome and the genetic factors of epitranscriptome disturbance in a specific tissue (or cancer type). m6A-TSHub is accessible at www.xjtlu.edu.cn/biologicalsciences/m6ats.

8.
Biomolecules ; 12(7)2022 06 28.
Artigo em Inglês | MEDLINE | ID: mdl-35883462

RESUMO

The development of high-throughput omics technologies has enabled the quantification of vast amounts of genes and gene products in the whole genome. Pathway enrichment analysis (PEA) provides an intuitive solution for extracting biological insights from massive amounts of data. Topology-based pathway analysis (TPA) represents the latest generation of PEA methods, which exploit pathway topology in addition to lists of differentially expressed genes and their expression profiles. A subset of these TPA methods, such as BPA, BNrich, and PROPS, reconstruct pathway structures by training Bayesian networks (BNs) from canonical biological pathways, providing superior representations that explain causal relationships between genes. However, these methods have never been compared for their differences in the PEA and their different topology reconstruction strategies. In this study, we aim to compare the BN reconstruction strategies of the BPA, BNrich, PROPS, Clipper, and Ensemble methods and their PEA and performance on tumor and non-tumor classification based on gene expression data. Our results indicate that they performed equally well in distinguishing tumor and non-tumor samples (AUC > 0.95) yet with a varying ranking of pathways, which can be attributed to the different BN structures resulting from the different cyclic structure removal strategies. This can be clearly seen from the reconstructed JAK-STAT networks by different strategies. In a nutshell, BNrich, which relies on expert intervention to remove loops and cyclic structures, produces BNs that best fit the biological facts. The plausibility of the Clipper strategy can also be partially explained by intuitive biological rules and theorems. Our results may offer an informed reference for the proper method for a given data analysis task.


Assuntos
Neoplasias , Teorema de Bayes , Humanos , Neoplasias/genética
9.
Front Genet ; 13: 895099, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35664332

RESUMO

Precise segmentation of chromosome in the real image achieved by a microscope is significant for karyotype analysis. The segmentation of image is usually achieved by a pixel-level classification task, which considers different instances as different classes. Many instance segmentation methods predict the Intersection over Union (IoU) through the head branch to correct the classification confidence. Their effectiveness is based on the correlation between branch tasks. However, none of these methods consider the correlation between input and output in branch tasks. Herein, we propose a chromosome instance segmentation network based on regression correction. First, we adopt two head branches to predict two confidences that are more related to localization accuracy and segmentation accuracy to correct the classification confidence, which reduce the omission of predicted boxes in NMS. Furthermore, a NMS algorithm is further designed to screen the target segmentation mask with the IoU of the overlapping instance, which reduces the omission of predicted masks in NMS. Moreover, given the fact that the original IoU loss function is not sensitive to the wrong segmentation, K-IoU loss function is defined to strengthen the penalty of the wrong segmentation, which rationalizes the loss of mis-segmentation and effectively prevents wrong segmentation. Finally, an ablation experiment is designed to evaluate the effectiveness of the chromosome instance segmentation network based on regression correction, which shows that our proposed method can effectively enhance the performance in automatic chromosome segmentation tasks and provide a guarantee for end-to-end karyotype analysis.

10.
Methods ; 203: 62-69, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35429629

RESUMO

Traditional epitranscriptome profiling approach relies on specific antibodies or chemical treatments to capture modified RNA molecules and then applies high throughput sequencing to identify their transcriptomic locations. However, due to the lack of suitable or high-quality antibodies, only a small proportion of the 170 documented RNA modifications were profiled with those approaches. Direct sequencing of native RNA molecules using Oxford Nanopore Technologies (ONT) enabled straight inspection of RNA modifications and offered a promising alternative solution. N6-methyladenosine (m6A) is known to cause characteristic changes and increased base call errors of ONT signals compared with non-modified adenosines, based on which, the m6A sites can be identified directly from ONT signals. Meanwhile, a number of studies have shown that it is possible to predict m6A sites from RNA primary sequences. Using the m6A sites revealed by Illumina technology as benchmark, we showed that, the accuracy of ONT-based m6A site prediction can be further increased by integrating additional information from the primary sequences of RNA (AUROC of 0.918), compared with using ONT signals only (AUROC 0.878 using Base call error features, and 0.804 using Tombo features), providing a new perspective for more reliable mining of the relatively noisy ONT signals.


Assuntos
Nanoporos , RNA , Adenosina/genética , Sequenciamento de Nucleotídeos em Larga Escala , Metilação , RNA/genética , Análise de Sequência de RNA
11.
Nucleic Acids Res ; 50(D1): D196-D203, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34986603

RESUMO

5-Methylcytosine (m5C) is one of the most prevalent covalent modifications on RNA. It is known to regulate a broad variety of RNA functions, including nuclear export, RNA stability and translation. Here, we present m5C-Atlas, a database for comprehensive collection and annotation of RNA 5-methylcytosine. The database contains 166 540 m5C sites in 13 species identified from 5 base-resolution epitranscriptome profiling technologies. Moreover, condition-specific methylation levels are quantified from 351 RNA bisulfite sequencing samples gathered from 22 different studies via an integrative pipeline. The database also presents several novel features, such as the evolutionary conservation of a m5C locus, its association with SNPs, and any relevance to RNA secondary structure. All m5C-atlas data are accessible through a user-friendly interface, in which the m5C epitranscriptomes can be freely explored, shared, and annotated with putative post-transcriptional mechanisms (e.g. RBP intermolecular interaction with RNA, microRNA interaction and splicing sites). Together, these resources offer unprecedented opportunities for exploring m5C epitranscriptomes. The m5C-Atlas database is freely accessible at https://www.xjtlu.edu.cn/biologicalsciences/m5c-atlas.


Assuntos
Bases de Dados Genéticas , Epigenoma/genética , Software , Transcriptoma/genética , 5-Metilcitosina/química , 5-Metilcitosina/metabolismo , Humanos , MicroRNAs/genética , Polimorfismo de Nucleotídeo Único/genética , Processamento Pós-Transcricional do RNA/genética , Análise de Sequência de RNA
12.
Methods ; 203: 226-232, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-34843978

RESUMO

With the rapid development of high-throughput sequencing techniques nowadays, extensive attention has been paid to epitranscriptomics, which covers more than 150 distinct chemical modifications to date. Among that, N6-methyladenosine (m6A) modification has the most abundant existence, and it is also significantly related to varieties of biological processes. Meanwhile, maize is the most important food crop and cultivated throughout the world. Therefore, the study of m6A modification in maize has both economic and academic value. In this research, we proposed a weakly supervised learning model to predict the situation of m6A modification in maize. The proposed model learns from low-resolution epitranscriptome datasets (e.g., MeRIP-seq), which predicts the m6A methylation status of given fragments or regions. By taking advantage of our prediction model, we further identified traits-associated SNPs that may affect (add or remove) m6A modifications in maize, which may provide potential regulatory mechanisms at epitranscriptome layer. Additionally, a centralized online-platform was developed for m6A study in maize, which contains 58,838 experimentally validated maize m6A-containing regions including training and testing datasets, and a database for 2,578 predicted traits-associated m6A-affecting maize mutations. Furthermore, the online web server based on proposed weakly supervised model is available for predicting putative m6A sites from user-uploaded maize sequences, as well as accessing the epitranscriptome impact of user-interested maize SNPs on m6A modification. In all, our work provided a useful resource for the study of m6A RNA methylation in maize species. It is freely accessible at www.xjtlu.edu.cn/biologicalsciences/maize.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Zea mays , Adenosina/genética , Adenosina/metabolismo , Metilação , Mutação , Zea mays/genética , Zea mays/metabolismo
13.
Bioinformatics ; 37(Suppl_1): i222-i230, 2021 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-34252943

RESUMO

MOTIVATION: Increasing evidence suggests that post-transcriptional ribonucleic acid (RNA) modifications regulate essential biomolecular functions and are related to the pathogenesis of various diseases. Precise identification of RNA modification sites is essential for understanding the regulatory mechanisms of RNAs. To date, many computational approaches for predicting RNA modifications have been developed, most of which were based on strong supervision enabled by base-resolution epitranscriptome data. However, high-resolution data may not be available. RESULTS: We propose WeakRM, the first weakly supervised learning framework for predicting RNA modifications from low-resolution epitranscriptome datasets, such as those generated from acRIP-seq and hMeRIP-seq. Evaluations on three independent datasets (corresponding to three different RNA modification types and their respective sequencing technologies) demonstrated the effectiveness of our approach in predicting RNA modifications from low-resolution data. WeakRM outperformed state-of-the-art multi-instance learning methods for genomic sequences, such as WSCNN, which was originally designed for transcription factor binding site prediction. Additionally, our approach captured motifs that are consistent with existing knowledge, and visualization of the predicted modification-containing regions unveiled the potentials of detecting RNA modifications with improved resolution. AVAILABILITY IMPLEMENTATION: The source code for the WeakRM algorithm, along with the datasets used, are freely accessible at: https://github.com/daiyun02211/WeakRM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
RNA , Software , Algoritmos , Ligação Proteica , RNA/genética , RNA/metabolismo , Análise de Sequência de RNA , Aprendizado de Máquina Supervisionado
14.
Nat Commun ; 12(1): 4011, 2021 06 29.
Artigo em Inglês | MEDLINE | ID: mdl-34188054

RESUMO

Recent studies suggest that epi-transcriptome regulation via post-transcriptional RNA modifications is vital for all RNA types. Precise identification of RNA modification sites is essential for understanding the functions and regulatory mechanisms of RNAs. Here, we present MultiRM, a method for the integrated prediction and interpretation of post-transcriptional RNA modifications from RNA sequences. Built upon an attention-based multi-label deep learning framework, MultiRM not only simultaneously predicts the putative sites of twelve widely occurring transcriptome modifications (m6A, m1A, m5C, m5U, m6Am, m7G, Ψ, I, Am, Cm, Gm, and Um), but also returns the key sequence contents that contribute most to the positive predictions. Importantly, our model revealed a strong association among different types of RNA modifications from the perspective of their associated sequence contexts. Our work provides a solution for detecting multiple RNA modifications, enabling an integrated analysis of these RNA modifications, and gaining a better understanding of sequence-based RNA modification mechanisms.


Assuntos
Biologia Computacional/métodos , Redes Neurais de Computação , Processamento Pós-Transcricional do RNA/genética , RNA/química , RNA/genética , Sequência de Bases , Metilação de DNA/genética , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...